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Abstract 

We show how thermodynamic properties of molecular models can be computed over a 
large, multidimensional parameter space by combining multistate reweighting analysis with a 
linear basis function approach. This approach reduces the computational cost to estimate ther¬ 
modynamic properties from molecular simulations for over 130,000 tested parameter combi¬ 
nations from over a thousand CPU years to tens of CPU days. This speed increase is achieved 
primarily by computing the potential energy as a linear combination of basis functions, com¬ 
puted from either modified simulation code or as the difference of energy between two ref¬ 
erence states, which can be done without any simulation code modification. The thermody¬ 
namic properties are then estimated with the Multistate Bennett Acceptance Ratio (MBAR) 
as a function of multiple model parameters without the need to define a priori how fhe sfafes 
are connecfed by a pafhway. Insfead, we adapfively sample a sef of poinfs in paramefer space 
to create mutual configuration space overlap. The existence of regions of poor configuration 
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space overlap are detected by analyzing the eigenvalues of the sampled states’ overlap matrix. 
The configuration space overlap to sampled states is monitored alongside the mean and max¬ 
imum uncertainty to determine convergence, as neither the uncertainty or the configuration 
space overlap alone is a sufficient metric of convergence. 

This adaptive sampling scheme is demonstrated by estimating with high precision the sol¬ 
vation free energies of charged particles of Lennard-Jones plus Coulomb functional form with 
charges between -2 and +2 and generally physical values of a,y and Sij in TIP3P water. We 
also compute entropy, enthalpy, and radial distribution functions of arbitrary unsampled pa¬ 
rameter combinations using only the data from these sampled states and use the estimates of 
free energies over the entire space to examine the deviation of atomistic simulations from the 
Born approximation to the solvation free energy. 


1 Introduction 

Many applications of molecular simulations require searehing over large parameter spaees to pre- 
diet or mateh physieal observables. Moleeular simulation parameters sueh as eharges, Lennard- 
Jones dispersion and repulsion parameters, as well as bonds, angles, and torsion foree eonstants 
determine the energies and probabilities of configurations in simulations, and thus in turn, deter¬ 
mine what thermodynamie properties will be observed. The ability to accurately estimate ther- 
modynamie properties without the need for laboratory experiments has the potential to save both 
time and resourees in fields sueh as polymer^ and solventP design as well as drug diseovery.l^ 
This time and eost savings is important both in the design of new moleeules, where properties 
are unknown, and ’reverse property predietion’ where a model or moleeule is designed to match 
specific experimental targets, sueh as designing metal-organic frameworks (MOFs) with specifie 
gas loadings.l^ 

Assigning proper parameters in a moleeular simulation given a set of experimental data be- 
eomes more diffieult as the number of free parameters inereases. Some experimental parameters, 
such as bond lengths and angles, are relatively easy to estimate using small-moleeule crystal- 
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lographic structures and quantum chemistry. However, nonbonded parameters, such as partial 
charges and dispersion terms, are much more difficult to choose as they are model parameters 
that do not directly correspond to laboratory observables such as bond length. Instead, possible 
nonbonded parameters are constrained by sets of experimental observables such as transfer free 
energies, heats of vaporization, densities, and heat capacities. 

Identifying nonbonded model parameters consistent with a set of experimental thermodynamic 
values requires expensive iterative, self-consistent simulations or even more expensive gradient op¬ 
timizations.Most condensed phase force fields^^^ are parameterized by manual iterative fitting 
to a training set of experimental thermodynamic data for a small set of molecules chosen to repre¬ 
sent a broader spectrum of similar molecules Accurate fits are required to predict properties of 

biological systems or complex mixtures where group contribution methods such as UNIQUACP^ 
and UNIFAC^ are inadequate. 

Some of the most computationally expensive properties to estimate involve the free energy dif¬ 
ferences between two states, such as the solvation free energy, which is the free energy difference 
of two systems as one solute molecule moves from solution to vapor, or activity coefficients, which 
measure the deviation of the chemical potential of a species from ideality. Accurately computing 
free energy differences (or equivalently, chemical potential differences) also provide a way to com¬ 
pute many other thermodynamic properties as they can be derived from the derivatives of the free 
energy with respect to temperature (T), pressure (P), volume (V), and number of particles (A,). 

Estimating free energy differences between two thermodynamic states accurately requires de¬ 
signing a thermodynamic path between the states. Paths that are both computationally and statisti¬ 
cally efficient, especially for full deletion or insertion of a molecule into a dense fluid, are nontrivial 
to design and must often include a number of non-obvious, nonphysical intermediatesThe 
end states and multiple intermediate states along the thermodynamic path must be sampled to cre¬ 
ate good configuration space overlap between the end states, which is required to accurately esti¬ 
mate free energy differences between the endpoints J ^l i9 | 34t {37] Technically, we require good overlap 
in the full phase space, both configurations and velocity, but because velocities are thermalized 
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in essentially all systems of thermodynamie interest, it is the eonfiguration spaee that we must 
generally worry about when eonneeting states together. 

If one wishes to eompute free energies of solvation for many different parameterizations of 
the same moleeule, the direet approaeh of ealeulating an entire thermodynamie pathway for eaeh 
parameter ehoiee is extremely expensive for highly aeeuraey ealeulations. Additionally, examining 
the differenees in free energies due to a small ehange of parameters is partieularly diffieult beeause 
we must take the differenees between two similar numbers with independent statistieal error. 

Reweighting methods ean help solve both the problem of expense and the problem of ean- 
eellations of errors. In a reeent study, our group showed how multistate reweighting ean direetly 
ealeulate the AAG between two different long-range interaetion approaehes, with very small uneer- 
tainties for relatively low eomputational eost.l^ The expense is lowered by eonstrueting thermo¬ 
dynamie eyeles direetly eonneeting Hamiltonians with similar parameters and thus signifieantly 
overlapping eonfigurational spaees. These small uneertainties are possible beeause MBAR ean 
direetly ealeulate the eovarianees between the two free energies through analyzing all potential 
energy differenees, rather than only uneorrelated ealeulations. This same approaeh ean be applied 
to small ehanges in parameters. 

Estimating properties with reweighting methods requires eonstrueting a thermodynamie path 
between the different parameterizations. It also requires potentially signifieant eomputational re- 
sourees to perform simulations with parameters that have properly overlapping eonfigurations. The 
eombination of these two requirements adds theoretieal and praetieal limitations to simultaneously 
searehing large, multidimensional nonbonded parameter spaees. 

1. The spaee of nonbonded parameters is often at least multiple dimensions per particle or 
partiele type. For example, the nonbonded parameters of eharge {q) and least two Lennard- 
Jones-like terms (C/y, Oij) ean result in at least three parameter dimensions per partiele type. 

2. There is no obvious way to define eomputationally effieient thermodynamie paths between 
any two points in these multidimensional spaees or seleet a prior simulation points in this 
spaee that give rise to low error estimates of thermodynamie properties aeross the entire 
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space. 


3. Reweighting methods requires eomputing energies from the sampled eonfigurations to other 
sampled states, and any unsampled state of interest. This re-eomputation typieally requires 
re-running the simulation foree loops over all generated eonfigurations for eaeh eombination 
of parameters of interest. The eomputational eost to seareh sueh a multidimensional spaee 
of nonbonded parameters seales, at best, linearly with the number of samples, and at worst 
quadratieally with the number parameter eombinations, sinee data may be eolleeted with 
simulations at eaeh parameter eombination.l^ 

Designing effieient thermodynamie paths through arbitrary thermodynamie states is a ehal- 
lenging task,^2Zl29l30] designing paths in multidimensional parameter spaees adds additional 
eomplexities. An example in Fig. demonstrates the ehallenges of identifying low-uneertainty 
paths in multiple dimensions. This figure shows two arbitrarily defined thermodynamie states in a 
two-dimensional parameter spaee and attempts to draw pathways between them with high mutual 
eonfiguration spaee overlap, providing low error estimates. However, the ehoiee of whieh path to 
sample to eonneet the states is not immediately obvious. The shortest Euelidean path in parameter 
spaee has large uneertainty, but two alternative paths have low uneertainty. This sort of multidi¬ 
mensional spaee raises questions with no obvious answers: How ean we a priori identify whieh 
paths have more mutual overlap (i.e. result in simulations with lower uneertainty) without exhaus¬ 
tively sampling the system? Could samples drawn from both paths but with different proportions 
provide lower uneertainty than sampling either path by itself? 
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Figure 1: Defining the “best” thermodynamic path between arbitrary states is a nontrivial 
problem. This figure shows several multidimensional thermodynamic paths connecting two states, 
and the relative uncertainty of each pathway shown by a gray scale gradient of each curve. The 
path with shortest Euclidean distance in parameter space (Ai, X 2 ) has a large uncertainty, and at 
least two of the other paths have lower uncertainty. However, it is unclear if sampling a single 
“best path,” or a combination of multiple low variance paths will have the highest computational 
efficiency for a target statistical error. The most computationally efficient sampling scheme may 
not be along any single path at all, and instead may be achieved by sampling a non-parameterized 
states in a multidimensional parameter space. 


Previous research on identifying low-variance paths along a single coupling parameter used 
local minimization of total variance along the path.lAd220o| However, in multiple dimensions, since 
multiple potential low-variance paths could exist, local optimization is unlikely to identify the 
most efficient path between states of interest, nor whether it would be more optimal to traverse 
both paths. The identification of an optimal path choice is made more complicated if we are 
interested in all the mutual free energy differences between multiple states, since we must identify 
a path or a network of paths connecting all of the states, with samples collected along each of 
these states. One would ideally want ad hoc rules estimating the computational efficiency of these 
paths, determined a priori to avoid unnecessary sampling. Defining such rules for a diverse set of 
chemical systems becomes increasingly complex as the dimensionality and the number of states 
increases. Removing the need to define these difficult multidimensional paths significantly lowers 
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these barriers to searehing through large parameter spaees for optimal parameters. 

We ean remove the need to explieitly define thermodynamie paths between states in multiple 
dimensions with multistate reweighting methods. These methods, sueh as the Multistate Bennett 
Aeeeptanee Ratio (MBAR)l^and the Weighted Histogram Analysis Method (WHAM) allow 
analyzing samples from anywhere in the parameter spaee to eompute properties anywhere else in 
the spaee, without the need to define whieh thermodynamie states are adjaeent as is needed for the 
original Bennett Aeeeptanee Ratio (BAR).^MBAR has the important advantage over WHAM in 
that sampled eonfigurations do not need to be binned along a pre-defined thermodynamie path for 
the analysis whieh is partieularly important in multidimensional spaees, where it beeomes inereas- 
ingly diffieult to populate histogram bins. Any given eombination of nonbonded parameters will 
share a high degree of eonfiguration spaee overlap with a large number of similar parameter sets, 
though it is not always elear a priori whieh ones those will be. MBAR takes into aeeount all of the 
information from all sampled states in determining the free energy differenees between any two 
states, sampled or unsampled. Therefore, with moderate sampling at the right parameter eombi- 
nations, thermodynamie properties ean be estimated at a large number of parameter eombinations, 
even without explieit sampling of every parameter eombination, so long as for eaeh ehoiee of pa¬ 
rameters there is some degree of eonfiguration spaee overlap with some eombination of sampled 
parameters. 

With the limitation of explieitly defining a thermodynamie path removed, we ean foeus on 
deereasing the eomputational eost of gathering the statistieal information required for multistate 
reweighting ealeulations of thermodynamie information. Reweighting methods require the energy 
of eaeh sampled eonfiguration to be known at eaeh state of interest to eompute free energy differ¬ 
enees and other properties between states. Computing a eonfiguration’s energy multiple times with 
different energy funetions is a time limiting step for ealeulating thermodynamie properties aeross 
large parameter spaees. Colleeting the energy of eaeh eonfiguration at eaeh state usually requires 
running the simulation eode, or at least the inner foree loop of simulation eode, multiple times on 
eaeh eonfiguration. If we only are interested in reweighting the properties from a single sampled 
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state, ^ we would only have to earry out this eomputation onee per eonfiguration (N) generated at 
a sampled state (Ks), per unsampled state of interest (Ku). This results in energy ealeulations that 
seale as ^{N{Ku+Ks)), assuming equal sampling per sampled state. However, sueh estimates 
are known to be statistieally ineffieient and prone to substantial bias when overlap is not substan- 
tiai.lMSSI Multistate methods, like MB AR,I^ require ealeulating the energy for eaeh eonfiguration 
at eaeh sampled state as well as eaeh state of interest, so the sealing is quadratic in the number 
of sampled states as ^ {^N{Ku+Kj)). This is not a burdensome task for a small number of states 
along a single thermodynamic path, making it well worth using multistate simulations regardless 
of how expensive recalculating the energies of this configurations are. However, as tens or hun¬ 
dreds of thousands of states of interest and hundreds of sampled states are considered, this scaling 
becomes a computational bottleneck that must be overcome. 

We can reduce the cost to compute energies at thermodynamic states to computationally trivial 
vector multiplication by defining the energies using linear basis functions Energies calcu¬ 

lated using a basis function approach can be most generally written by 

n 

T ^unaffecteci(^) (1) 

i 

where u{r,X) = ^U{r,X) is the reduced energy as a function of both the configuration r and some 
(possibly multidimensional) alchemical coupling parameter A; hi{X) are a set of nonphysical, al¬ 
chemical switches that are independent of configuration; Ui{r) are the basis functions; n is the 
total number of basis function and alchemical switch pairs; and Munaffected(^) is the system’s po¬ 
tential energy not dependent on the alchemical variables. This approach computes the energy of a 
configuration at any thermodynamic state by scalar multiplication of the configuration dependent 
basis functions, which only have to be computed once per configuration. The vector multiplication 
can eliminate the need to run the inner force loop on a configuration more than once, reducing 
the computational cost of evaluating energies from ^ (^N{Ku + Kj)), to ^ which is simply 

the total number of sampled configurations. The alchemical switches can take any form, so long 
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as r remains part of the basis funetions and not the switehes. This form of the potential energy 
is in eontrast to forms such as the soft core form of repulsive interactions,which cannot be 
represented as sums of separable combinations of r and A. 

In this study we combine multistate reweighting methods with a linear basis function approach 
to compute thermodynamic properties over a large nonbonded parameter space. To demonstrate 
the process, we look at the Lennard-Jones parameters Su and Ou, and partial charge, qu for a single 
particle in explicit solvent. This approach could can aid in future large parameter space searches to 
quickly find a range of nonbonded parameters and fine tune a fitting or optimization procedure. The 
relative free energy, enthalpy, and entropy of solvation are explored as these are some of the most 
computationally expensive properties to estimate. We also estimate the Born solvation free energy 
of charging, compare our results to specific ion free energies computed from others, and compute 
radial distribution functions. The techniques shown here are generalizable to other thermodynamic 
properties. 

We explore both a two-dimensional and a three-dimensional parameter space. A 2-D parameter 
space in £/, and Ou is first explored as a test of searching through parameter space. A larger, 3- 
D parameter space in £ij, Ou, and qi is then also explored and iterative simulations are carried 
out to reduce the statistical error in the estimate of solvation properties across the entire range of 
parameters. 

2 Theory 

The notation in this study is as follows. On and Sn are the Lennard-Jones parameters, and q is 
the charge of a particle. Co is the permittivity of vacuum. u{r) are basis functions of the potential 
energy function and h{X) are the alchemical switches. Subscripts E, R, and A denote an electro¬ 
static, Lennard-Jones repulsive, and Lennard-Jones attractive term respectively, e.g. UE{r) is the 
electrostatic basis function. The subscripts z, j, and k on nonbonded parameters denote arbitrary 
atoms, and subscripts X,Y, and Z denote an explicit set of parameters which define a thermody- 
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namic state, and have fixed values but not explieitly defined to be general. Subseript S denotes a 
solvent particle. The subscript £ will be used for summation indices, and C represents a collection 
of constants. 


2.1 Representing nonbonded parameter space with basis functions 


We generalize the potential energy to simplify writing the energy of any potential in multidimen¬ 
sional space. The three nonbonded parameters explored here lead to a pairwise nonbonded poten¬ 
tial energy between two point particles a distance r apart 


u{r) 




A2 


r6 


+ 


‘2’;^ j 
4-7t£or' 


( 2 ) 


Eq. @ can be more generally written as 


u{r) 




r 



( 3 ) 

( 4 ) 


where n takes discrete values of 12, 6, or 1 depending on the index of £ and each C„ corresponds to 
the power of r~'\ The energy of a configuration at any point in parameter space is found by adding 
an alchemical switch, to each term of Eq. Q. Each can vary independently each other, 

allowing a multidimensional representation of the energy in terms of the parameters. The alchem¬ 
ical switches scale each of the 12, 6, and 1 terms to produce each of the target thermodynamic 
states. The total potential is then 


M(r, A) = Munaffected(?‘) +Y, ( ' (5) 

e \ ^ J ^ 

Computing the basis functions can be done either directly in code or in post-processing with 
fixed reference states. The most computationally efficient way to compute the basis functions 
would be to have the simulation package provide them at run time. However, most simulation 
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packages will not allow the user direet aeeess to the basis funetion values without heavily mod¬ 
ifying its eode, sinee usually only the total potential energy or the total A-dependent energy is 
required. An alternative solution whieh avoids any eode modification is to choose two fixed ref- 
erenee states and eompute the basis funetions as a differenee in energy, as was done in this study 
and explored below. This alternative approaeh means we must run the foree ealeulations at least 
three times for any sampled eonfigurations: onee while the samples are generated, and onee for 
eaeh referenee states. However, the computational cost is only & {3NKs), which still scales mueh 
better than {N{Ku-\-K^)) as Ku and Ks inerease. 

The potential ean be represented as linear eombination of alehemieal perturbations around a 
fixed referenee partiele at state X with respeet to a seeond referenee partiele at state Y as 


— ^unaffected (^) T ^ 


{l-hn{Xn))Cn,x{r)+hn{Xn)Cn,Y{r) 


— ^unaffected ) T ) T ^ 

I L 


^Cfi^XY ) 


( 6 ) 


\i 


where AC„^xy(^) = Cnj(r) —Cn,x{r) and ux(r) is the eomplete nonbonded pairwise potential for 
partiele X alone. The eomputed basis funetions are then ealeulated as the energy differenee be¬ 
tween the two referenee partieles, and the unmodified potential energy of partiele X beeomes part 
of Munaffected(^)- The potential energy at arbitrary state Z ean now be eomputed using this pertur¬ 
bation. This referenee state approaeh makes computing the basis funetions possible without major 
simulation eode ehanges. The numerieal error should be monitored for any round off error sinee 
two similar energies are subtraeted, as we diseuss in Seetion |0| 

Unlike with standard alehemieal transformations between A = 0 and A = 1, the aeeessible 
parameter spaee is not bounded by the referenee states. Consider an arbitrary state, Z, with param¬ 
eters outside the range of the parameters Cn,x and Cnj. The values of alehemieal switehes defining 
Z would then fall outside the standard [0,1] domain. States whieh fall outside this domain still have 
physieal meaning in this eontext, unlike states with A outside [0,1] have no meaning for partiele 
insertion or deletion simulations. For example, the expanded domains in our previous studies^^USo] 
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served no practical purpose since h„{Xn) < 0 represented a state where particles had an attractive 
atomic center, and h„(A„) > 1 represented a state “more than fully coupled.” 

The number of terms in Eq. (|^ will increase quadratically as the number of interaction sites in 
the solute increase, increasing the number of £,j and Ojj terms. However, geometric mixing rules 
can avoid such a large increase in terms. Details of how geometric mixing rules allow terms 

to be sol vent-independent are given in supplementary material in section S.l. 

3 Experimental Design 

Molecular dynamics (MD) simulations of a single particle in explicit TIP3P water were carried 
out with GROMACS 4.6.5*^^!^ compiled in double precision. NVT equilibration was carried out 
for 100ps, followed by NPT equilibration for 500ps, followed by NPT production simulations of 
6 ns per simulated parameter combination. Temperature was held at 298 K and coupled through 
Langevin dynamics with a time constant of 5ps. Pressure (for NPT simulations) was held at 
latm and coupled with a Parrinello-Rahman barostat,^22lSI] ^ith a time constant of 5ps, and a 
compressibility of 4.5 ■ 10^^ bar^^ 

Solvation properties were estimated over a grid of nonbonded parameters for the particle. For 
the 2-D case, the parameter ranges are 0.0239kcal/mol < £,/ < 0.8604kcal/mol (O.lkJ/mol < 
£ii < 3.6kJ/mol) and 0.25nm < Ou < 1.2nm. This range was chosen to include the largest possi¬ 
ble particles in the OPLS-AA force fieldwith additional parameters to test the limits of the 
reweighting methods. Solvation properties were calculated on a square grid of Eu and Ou with 151 
grid points in each dimension for 22,801 total parameter combinations. Initial grid points were 
distributed uniformly in Eu and uniformly in ofj so that sampling was done approximately propor¬ 
tional to the free energy of cavitation.l^^EII Relative solvation properties were computed from the 
reference parameters Ea = 0.1816kcal/mol, Ou = l.OlVOnm so the reference was roughly in the 
middle of the space. 

For the 3-D case, the parameter ranges are 0.0239 kcal/mol < Eu < 0.8604kcal/mol, 0.25 nm < 
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<Jij < 0.958nm, and —2.0 < qi < +2.0 in units of elementary eharge with eaeh dimension having 
51 points for 51^ grid points in the eube and a total of 132,651 parameter eombinations. To im¬ 
prove resolution in some of the images, 101 On states were estimated for 101 ■ 51^ grid points 
and 262,701 eombinations. The referenee state ehosen for this set test was En = 0.0502keal/mol. 
On = 0.5732 nm, and qi = 0.0. This set eovers partieles in the OPLS-AA foree field from hydrogen 
(bound to a earbon), through the largest ions. The referenee state was ehosen to show how proper¬ 
ties with low uneertainty ean be estimated to particles of very different sizes and charges through 
iteratively selecting new parameter combinations to simulate. The spacing for initial sampling for 
£n and On remains unchanged from the 2-D case, and the sampling in qj was done proportional to 
qj in keeping with Born theory for the free energy of solvation of charged spheres. This choices 
resulted in initial sampling at charges ±2.0000, ±1.8516, ±1.6903, ±1.5119, ±1.3093, ±1.0690, 
±0.7559, and 0.0000, all with the reference state choices of £n and On- Starting molecular geome¬ 
tries were generated with AMBERTOOLS’s LEaP^and initial equilibration was carried out with 
the reference state parameters. All other solutes started their equilibration process from the final 
frame of the reference ion’s NPT equilibration step. 

Details about specific algorithm to compute basis functions with GROMACS and all input files 
are included in the supplementary information for this article.^ The analysis code can be found 
on GitHub.^ 


4 Results and Discussion 

4.1 Solvation properties over a 2-D parameter space 

With the combination of methods described above, we can efficiently and accurately calculate 
the free energy of solvation and other thermodynamic properties over multidimensional parame¬ 
ter spaces. Eigure shows the free energy, and error in free energy of uncharged Eennard-Jones 
spheres evaluated at 151^ combinations of £n and On- The free energy differences were estimated 
using MBAR implemented in the pymbar package.l^ One of the main keys to making this cal- 
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culation feasible is that the linear basis function approach allows rapidly calculation of potential 
energies in post-processing. Reconstructing the potential energies required for free energy esti¬ 
mates through vector operations takes only seconds on a single core of a desktop computer’s CPU. 
The same evaluation of energies would have to be run through the inner force loops at all 151^ 
states without the linear basis function method, scaling as & {N{Ku+K^)) . We ran each sampled 
state’s trajectory through single point energy calculations with GROMACS estimating the poten¬ 
tial under every other sampled state thermodynamic conditions to quantify the computational cost. 
Each simulation of 30000 samples took over 1500 CPU seconds to re-evaluate the energies of 
the given trajectory. For reference, the average time to run the simulation on the same hardware 
was 25 CPU hours. If we make a conservative calculation and assume each re-run of the inner 
force loop took the minimum 1500 CPU seconds, the 12 sampled states would have taken 13 CPU 
years to run each configuration through the 151^ parameter combinations. This computational cost 
illustrates the primary speed improvement over re-running the inner force loop code, since time 
required to collect samples and estimate free energies is not affected by how the potential energies 
are computed in post-processing. 
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(a) Original 11 sampled states 


(b) Naive additional sampling at 12 states and spot checks 
of Lennard-Jones spheres 


Figure 2: The free energies at any combination of nonbonded parameters can be predicted 
in post-processing and regions of large uncertainty can be quickly found. Shown is the free 
energy (top panels) and statistieal error in the free energy (bottom panels) for 151^ parameter 
eombinations of Su and On with no eharge. Samples were drawn at loeations shown by an X and 
the drawn line is to guide the eye. The free energies are all relative to the referenee state shown 
by the diamond. shows the estimates drawing samples from 11 states only. A large region 
of uneertainty is seen at small On. (b) A 12* state was sampled in the region of high uncertainty, 


reducing the uncertainty in the whole region. The relative free energy of solvation was compared to 
chemically realistic spheres through direct simulation and is within error of the 5 AG. Free energy 
is shown in units of kcal/mol. 


For the 2D problem, regions of large uncertainty can quickly identified by visual inspection. 


and additional samples can drawn to reduce uncertainty. Fig. 2a shows the estimates of the solva¬ 
tion free energy when sampling from only 11 equivolume spaced states. We can see in the figure 
that parameter combinations with roughly On < 0.5 nm have high estimated uncertainty with re¬ 
spect to the reference state. We can naively sample by a single additional state in this region which 


drastically reduce the error in our estimation across this range as shown in Fig. 2b The error is a 
much steeper function of On than £/, in these ranges since large particles share virtually no config¬ 
uration space overlap with small particles in a dense fluid due to changes in the packing of solvent 
particles around small solutes. 

The linear basis functions approach reproduces the results from direct fixed-parameter solva¬ 


tion simulations. Fig. 2b is annotated to show where several chemically realistic Lennard-Jones 
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spheres fall in the parameter space. Solvation simulations were run for the parameters of united 
atom (UA) methanej ^^^* — ^ neopentane, ^ and a sphere roughly the size of a €50 molecule.^ The 
relative free energies are statistically indistinguishable between the direct solvation simulations 


and those computed in Fig. 2b The exact numbers and methods for the solvation simulations are 
shown in the supplementary material in Section S.2. 


4.2 Solvation properties over a 3-D parameter space 

Even visualizing the thermodynamic properties and their uncertainties in 3-D space is a nontrivial 
task. Iterative determination of optimal states to sample is significantly harder. Fig. shows the 
relative solvation free energy in the 3-D parameter space for three slices of fixed qi, and samples 
drawn from the initial 21 states. Because the reference state is an uncharged particle, there is 


large uncertainty in the relative solvation free energy to charged particles, as seen by Fig. 3a and 


Fig. 3c We show the entire 3-D space of solvation free energy as a function of the three force 
fields parameters in animated movie provided in the supplementary material.l^ 
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Figure 3: The free energies in a multidimensional nonbonded parameter space can be esti¬ 
mated and visualized rapidly. Shown are slices of the 3-D parameter space cube at fixed qi with 
the initial sampling of 21 states. Because the reference state is an uncharged particle, the error to 
charged parameters is large. The lack of configuration space overlap causes the uncertainty in the 
error estimate is large and unconverged. andshow samples of the free energy on either side 
of ^ = 0. |(b)|shows a discontinuous jump in uncertainty between the nearby charge = —1.52 in 


(a) an artifact of poor configuration space overlap. Animated movies showing the full free energy 


and uncertainty across the whole parameter space for initial and final samplings are included in the 
supplementary materials.!^ Free energy is shown in units of kcal/mol. 


Regions of poor configuration space overlap and large uncertainty can be visually identified in 


the initial simulations. Fig. 3a and Fig. 3b are taken from two nearby values of charge. One would 


expect the uncertainty to change smoothly with the partial charges differing by only 0.08, however 


the magnitude of the uncertainty changes by a factor of nearly two, causing a visual artifact. Unless 
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there is some sort of phase ehange in the system (whieh, as it turns out, there is not), these sorts 
of artifaet indieates that there is little eonfiguration spaee overlap, and thus both the free energies 
themselves as well as the estimate of the uneertainties have not eonverged. In order to improve 
our estimates, additional samples must be drawn at new parameter eombinations to improve the 
eonfiguration spaee overlap between our referenee point and parameter regions sueh as the ones in 
Fig.[^ However, deeiding exaetly where to put these new samples eannot be easily done by visual 
inspeetion as in the 2-D ease. We must identify an algorithmie ways of plaeing these new points 
that ean be easily automated, rather than having to do time-eonsuming manual trial-and-error. 

4.3 Adaptive sampling in 3-D parameter space and improving configuration 
space overlap 

As noted before, explieit thermodynamie paths need not be be direetly defined by the user if prop¬ 
erty estimates are made by reweighting samples spanning the eonfiguration spaee of all parameters 
of interest. A multistate statistieal analysis method sueh as MBArI^ takes into aeeount the eon¬ 
figuration spaee overlap from all samples relative to the state of interest. The eoneept of a single 
“path” is now obsolete sinee any two states are now eonneeted through a network of eonfigura- 
tion spaee overlap and eonneeted states. Suffieient eonfiguration spaee overlap between all of the 
states of interest and sampled states results in low statistieal error in estimating relative properties 
between states on the network. 

We define two types of eonfigurational spaee overlap that help us better deseribe the issues of 
searehing a multidimensional parameter spaee. We define the local configuration space overlap as 
the set of eonfigurations shared between simulations performed at an arbitrary parameter eombi- 
nation, and other nearby parameter eombinations up to some finite ±p away in parameter spaee, 
where p is one of the parameters of interest. We also define global configuration space overlap as 
the extent to whieh all points of interest in the parameter spaee are eonneeted to all other points 
in this parameter spaee through a eonneeted network of regions with loeal eonfiguration spaee 
overlap. 
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A key issue in sampling multidimensional parameter spaces is that the most naive adaptive 
sampling will lead to local configuration space overlap, but not global configuration space overlap. 
The intuitive place to put additional samples to improve uncertainties in a free energy calculation 
might initially be the parameter combinations where uncertainties are the largest. Sampling only 
the largest uncertainty point may improve the local configuration space overlap for states near this 
sampled state, but does not necessarily create a network of states with global configuration space 
overlap connecting the reference state and the state of interest. Without a connected network, the 
uncertainty in relative solvation free energy differences between states inside the local neighbor¬ 
hood of states becomes smaller, but uncertainty to the reference state, or any other state outside 
of the local configuration space overlap, will still be large. The presence of local overlap but not 
global overlap is easy to identify when sampling along a 1-D path, since it is easy to tell where 
along the path there are insufficient samples. Global overlap is harder to identify and create in 
higher dimensional space since it is much less obvious where new samples should be placed. We 
need not place samples in all places where they do not occur; we instead must identify approaches 
that can automatically construct a network of overlapping states. 

In our previous studies, optimizing the alchemical switches to identify the most statisti¬ 
cally efficient pathways worked well to optimize alchemical paths in one dimension. Analyzing 
the sample variance in thermodynamic integration (TI) along paths and optimizing the h„(A„) to 
reduced this variance identified the lowest error pathway. However, since we need a network of 
connected states in multidimensional space, optimizing TI would require reducing the multidimen¬ 
sional space to 1-D and connecting each state along a fixed path. However, since there are many 
possible ways to connect states in multidimensional space, optimizing TI in this multidimensional 
space would require connecting each state to every other state along a fixed path, resulting in 
(51^)!, more than 10^®^ possible overlapping paths, assuming the paths was restricted to visiting 
each state only once. We can instead create a collection of states that creates a global configuration 
space overlap network over the entire space by sampling discrete states in the multidimensional 
space, then analyzing the configuration space overlap between pairs of states with MBAR.I^ 
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We used an array of elustering and image reeognition algorithms, outlined in further detail 
in the supporting information, to find elusters of loeal phase spaee overlap and eonneet them, 
gradually improving global eonfiguration spaee overlap. Loeal phase spaee overlap was identified 
in the 51^ grid by identifying adjaeent points having nearly identieal statistieal uneertainty to the 
referenee state. When leaving these regions, the estimate of the statistieal uneertainty ehanges 
diseontinuously. Laek of eonfiguration spaee overlap between two elusters results in a nearly 
eonstant large uneertainty estimate between any two points in either eluster. Treating the grid of 
uneertainty estimates to the referenee state as a funetion of the three parameters, we interpreted 
this as a 3-D image. We used SeiPy’s^ multi-dimensional image proeessing module, ndimage, 
along with a density-based elustering algorithm, DBSCAN,!^ to identify elusters of grid points 
with loeal eonfiguration spaee overlap based on estimated statistieal uneertainty. 

We ehoose a new state to sample inside eaeh eluster of loeal eonfiguration spaee overlap. Eaeh 
eluster was treated as volume oeeupying shape in the parameter spaee, and eaeh shape had a bound¬ 
ary identified as the parameter eombinations on the border of the eluster. We ehose one new state 
inside a given eluster to sample at a random loeation inside the eluster. We then drew a series of 
line eonneeting this new state to the referenee state, and this new state to eaeh other eluster’s new 
state. We identify the interseetion of the series of lines with the eluster boundaries with SeiPy’s So- 
bel boundary deteetion algorithm.® Sampling the additional point on the boundary of the eluster 
provides a way to extend the area of loeal eonfiguration spaee overlap until elusters joined to ereate 
a network of eonfiguration spaee overlap, eventually ereating global eonfiguration spaee overlap; 
we detail how we ehoose point on the boundary below. We iteratively apply this until it ereates 
global eonfiguration spaee overlap through a network of eonneeted states. 

By using a variety of graph theory methods, we earn propose new states at loeal eonfiguration 
spaee boundaries that help to minimize the number of sampled states required to get good sampling 
aeross the parameter spaee. We briefly summarize the algorithm here, with additional details of 
the algorithm eovered in the supplementary material® in seetion S.3. The implementation used is 
available online.® We generated a eomplete, weighted graph where the new states identified inside 
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each cluster were the vertices, and each edge were the lines connecting the new states between 
clusters and the reference state. The weight of each edge connecting the vertices was computed 
by numerical integration of the uncertainty at uniformly spaced points in Euclidean space on the 
edge. The uncertainty of each integration point was estimated by multidimensional interpolation 
from nearby grid points since many integration points were not on the 51^ grid. A minimum 
spanning tree (MST) was created from this complete weighted graph using Kruskal’s algorithmic 
implemented in SciPy’s sparse graph routines. We sampled the vertices of the MST, residing 
inside a high uncertainty cluster, and the intersection of the MST’s edges with the boundary of the 
cluster as the final set of states sampled in the next iteration of the algorithm. This approach has 
the advantage of scaling to arbitrary A-dimensions, as direct visualization of higher dimensional 
spaces becomes increasingly difficult. 

Statistical uncertainties in the free energy differences between states are reduced by two orders 
of magnitude even though the amount of sampling is only increased by nine times (21 to 204 
states), because this adaptive algorithm generates good global configuration space overlap. Fig.|^ 
shows the same three slices of the 3-D parameter space as Fig. now with 203 sampled parameter 
combinations, all adaptively chosen except for the initial 21 combinations. We estimated properties 
at 101 On points for the figure to improve image quality. All time comparisons are made assuming 
51^ parameter combinations. During this process, the maximum error in relative solvation free 
energy differences was reduced from 53.405kcal/mol to 0.631 kcal/mol and the mean error was 
reduced from 16.162kcal/mol to 0.118kcal/mol. However, the initial uncertainty is a misleading 
underestimate as much of the parameter space had no global configuration space overlap with the 
reference state, meaning the error estimates are unconverged. 
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Figure 4: Adaptive sampling allows reduction of the uncertainty in the whole multidimen¬ 
sional nonhonded parameter space. Shown are slices of the 3-D parameter space cube at the 
same fixed qi as in Fig. The total uncertainty has been reduced by more than an order of mag¬ 
nitude with only a few adaptive iterations and a total of 203 sampled states. |(a)| and [(^ have 


significantly reduced error relative to their counterparts in Fig.[^ [(b^ no longer has the discontinu¬ 
ous uncertainty as it did in Fig. Animated movies showing the full free energy and uncertainty 
across the whole parameter space are included in the supplementary materials.^ Free energy is 
shown in units of kcal/mol. 


The consequences of the poor configuration space overlap can be seen in Fig. where the max¬ 
imum and mean uncertainty jumps at a certain iterations. The jumps in uncertainty indicate that a 
new region of poor configuration space overlap has been identified and partially sampled. Ways to 
monitor when global configuration space overlap has been reached are discussed in Section [43j 
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Figure 5: Discovery of regions of poor configuration space overlap appear as sudden jumps 
in the maximum uncertainty from between two iterations, but these regions are joined by 
sampling adaptively. The mean (solid) and maximum (dashed) uncertainty in the free energy for 
the 3-D nonbonded parameter combinations are plotted as function of iteration of the algorithm. 
The algorithm reduces the uncertainty of the largest areas of uncertainty before moving to oth¬ 
ers, where it can find regions of no configuration space overlap. Once some configuration space 
overlap is found through adaptive sampling, the uncertainty jumps as a more converged estimate 
can be made. The iterative process improves phase space overlap and lowers overall uncertainty. 
Uncertainty in free energy is shown in units of kcal/mol and shown using a logarithmic scale to 
show changes at both large and small maximum uncertainty. 


The adaptive sampling algorithm correctly places samples to reduce regions of poor configu¬ 
ration space overlap. Fig. shows all of the sampled states in a scatter plot of the 3-D parameter 
space. Subsequent adaptive iterations are shown in color scale ranging from blue for the initial 
iterations to red in the final iteration. The large clustering of points is expected at small On, small 
£ii, and large qi, because the water tightly rearranges around the particle due to very large Coulom- 
bic interactions. The tightly packed water arrangements share little no configuration space overlap 
with any other parameter combinations, so many samples at these states and connecting states are 
needed to accurately estimate properties. 
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Figure 6: The adaptive sampling algorithm samples the entire parameter space. The sampled 
states are shown as a scatter plot with successive iterations moving from initial (blue) to final 
(red) as a function of the 3-D parameter space. Many new states are selected at small Ou as large 
uncertainty in the region is identified from successive iterations. Fewer samples states are needed at 
intermediate On as the configuration space overlap to states already sampled is high. The reference 
state is shown as an X. 

The bottleneck of computing the energies is completely removed with the linear basis function 
approach. There are more than five times more states in the 3-D parameter (132,651) space than 
the 2-D space (22,801). Despite this, computational cost to compute the energies in 3-D space 
only increases to less than 30 CPU seconds for 21 sampled states, and just over 50 CPU seconds 
for 203 sampled states. Taking again the conservative average of 1500 CPU seconds needed per 
rerun, computing the energies at the 132,651 parameter combinations would have taken over 132 
and 1280 CPU years for 21 and 203 sampled states, respectively. Optimized code to explicitly 
calculate the basis functions at the time of the force calculation would allow even faster vector¬ 
ized calculations than the post-processing used here, allowing this method to scale to even larger 
multidimensional spaces. After removing this bottleneck, the main cost is in performing the 203 
simulations at sampled states and running MBAR. Our simulations took an average of 25 CPU 
hours per simulation to run, and MBAR calculations took 108 CPU hours to compute properties. 
We would need to invest the time to run simulations and compute properties with MBAR, inde¬ 
pendent of how the energies were computed. 

The algorithm could be further optimized depending on the relative cost of the simulations 
and of MBAR over very large numbers of states. In this case, simulations were relatively cheap 
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compared to MBAR. If the simulations were more expensive, then shorter simulations could be run 
between iterations. Additionally, more proposed states for direct simulation could be generated at 
each step to reduce wall time. For example, instead of 10 new states run for 10 ns, 20 new states 
could be run for 5 ns each. In general, shorter cycles of simulation plus analysis are expected 
to improve performance as within well-sampled regions, the error in free energy estimates scales 
approximately as Adding significant configuration space overlap between regions that are 

not connected will scale significantly better. 

4.4 Computing Other Thermodynamic Properties and Comparing to Re¬ 
ported Results 

Estimating properties over the entire multidimensional parameter space at once can provide ther¬ 
modynamic information which would otherwise require extensive simulations to compute at each 
thermodynamic state. This section looks at estimating five properties where significant simulation 
is required when sampling each state individually: the relative solvation entropy and enthalpy, the 
absolute solvation free energy of ions, the radial distribution function (RDF) focusing on the first 
hydration shell, and the difference between the Born solvation free energy and the simulation es¬ 
timate in the free energy of charging a particle. In this section, we show how we can compute the 
enthalpy and entropy from the same collected data as used for free energies, compare ion parameter 
free energies to show accuracy and limitations in our approach, estimate RDFs without generating 
trajectories at the target parameters, and identify trends in the deviation of solvation free energies 
from the Born solvation approximation. 

4.4.1 Relative Solvation Entropy and Enthalpy 

We can estimate the relative solvation entropy and enthalpy alongside the relative solvation free 
energy without additional sampling. The relative solvation enthalpy is the difference in solva¬ 
tion enthalpy, A//soiv> between the reference state X and any other state j in the multidimensional 
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parameter spaee. We eompute the relative enthalpy differenee, A {AHxj), as 


A{AHxj)^(AUxi). (7) 

Similarly the relative solvation entropy, TA (ASxj), is eomputed as 

- TA (ASxj) = A (AGxj) - A (AHxj) ( 8 ) 

where A [AGxj) is the relative free energy of solvation, and (■) is the statistieal expeetation value, 
eomputed here by MBAR.I^ In all eases, we report the differenee in thermodynamie properties 
with respeet to the referenee state. These properties we estimate have sueh large uneertainty from 
only the initial 21 states due to poor sampling that any number is essentially meaningless. Com¬ 
puting relative enthalpies and entropies generally requires signifieantly more samples to eompute 
than relative free energies,as only samples with loeal phase spaee overlap to the end states 
eontribute to preeision of expeetations of observables sueh as the enthalpy. 

Additional sampling reduees the uneertainty in the estimates of relative solvation entropy and 
enthalpy by orders of magnitude, but not to the same extent as the uneertainty in the free en¬ 
ergy. This is beeause whether or not there is good global phase spaee overlap does not ensure 
that a given state has good loeal phase spaee overlap with its neighbors. Fig. shows the relative 
solvation entropy and enthalpy estimations, along with uneertainty at 203 sampled states. The 
uneertainty smoothly transitions between adjaeent states, suggesting the estimates are numerieally 
eonverged. However, the maximum uneertainty is several orders of magnitude larger than the rel¬ 
ative solvation free energy. The maximum uneertainty in relative solvation entropy ehanges from 
an uneertain estimate to 0.193keal/ (mol-^), and the relative solvation enthalpy’s maximum un¬ 
eertainty drops to 57.5koal/mol. The mean uneertainty for relative solvation entropy and enthalpy 
fall to 0.0147keal/(mol-.ST) and 4.37koal/mol, respeetively. Although these error estimates are 
still too large to make praetieal predietions of solvation entropies, the estimates of the errors from 
the 203 states are well-defined, whieh is a marked improvement over the initial sampled states. 
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The whole configuration space has been sampled, and all states now have at least moderate local 
phase space overlap with its neighbors. Once decent estimates of properties are found, we can run 
additional simulations on states that have the most desirable preliminary estimates of properties. 



Figure 7: Uncertainty in entropy and enthalpy are also reduced with the uncertainty in the 
free energy. The enthalpy H, (|(a^ and |(b)|) and entropy S, (|(c)] and |(d)|) are computed from 203 


total sampled states and reported in kcal/mol. The uncertainty in both properties now smoothly 
transitions between adjacent states. The uncertainty is still significantly larger than that of the free 
energy, which is expected as these properties require more sampling to compute accurately than the 
free energy. Additional samples or other means of computing these properties would be required 
to reduce error further. 
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4.4.2 Ion Solvation Free Energies 


We compare the results from our study to a detailed ion parameter study by Joung and Cheatham.^ 
Their study parameterized Li+, Na+, K+, Rb+, Cs+, F^, Cl^, Br^, and in several water mod¬ 
els including TIP3P and compared them to experimental and computational studies from others. 
They parameterized the ions based on several experimental observables, including free energy of 
solvation. We compared their absolute solvation free energies to absolute solvation free energies 
we computed from our method in TableThe table shows the ion parameters, free energy of sol¬ 
vation, AG, and first hydration shell (FHS) location estimated from their work and our approach. 
We compute absolute solvation free energies for our work by adding the free energies from the 
relative simulation to those of a single set of solvation simulations of the reference particle along 
a soft core potential^^^*^ and a 1-1-6 parameterization.l^^I^ These simulations were run with the 
same conditions as described in section]^ at 11 states uniformly distributed along A from 0 to 1 of 
the soft core path. 

We re-computed parameter values and adjusted the reported free energies to make the values 
from Joung and Cheatham^^ comparable to this work. Their ion an was calculated for Lorentz- 
Berthelot mixing rules. We back calculated the On for geometric mixing rules in Table [^by setting 
the Oij between the ion and the oxygen in water equal in both mixing rules, giving the relation 


''lon-ion, geo 


(<7ion-ion. LB + <70W-0w)" 

4o'ow-ow 


(9) 


where Cion-ion, geo is the reported ion On in Table Cion-ion, lb is an for the Lorentz-Berthelot 
mixing rule reported by Joung and Cheatham, and Cow-ow is the TIP3P oxygen-oxygen ajj which 
we assumed was constant between the mixing rules. The solvation free energies from Joung and 
Cheatham have been adjusted by -1.9 kcal/mol to remove the correction they added for ideal 
gas expansion when comparing simulations, carried out with gas-phase standard states at 1 M, to 
experimental results, where gas-phase standard states are typically 1 atm.l^^l^ 
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Table 1: Absolute solvation free energies and first hydration shell (FHS) loeations eompared from this work and from Joung and 
Cheatham.l^ Ion Oij were baek ealeulated from Lorentz-Berthelot to geometrie mixing rules for ion/oxygen interaetions. £/, and AG are 
in keal/mol; On and FHS loeation are in nm. Error in this work’s FHS eomputed by 200 bootstrap samples^ with a diseretization of 
±0.0075 nm. 


Ion 

On 

^ii 

AG 

Joung and Cheatham 

This Work 

FHS 

Joung and Cheatham 

This Work 

Li+ 

0.1965 

0.0290 

-115.6 

-105.26 ±0.54 

0.196 

0.211 ±0.016 

Na+ 

0.2479 

0.0874 

-90.6 

-90.63 ± 0.44 

0.238 

0.234 ± 0.015 

K+ 

0.3039 

0.1937 

-72.6 

-72.42 ± 0.36 

0.275 

0.279 ± 0.016 

Rb+ 

0.3231 

0.3278 

-67.6 

-67.31 ±0.35 

0.292 

0.294 ± 0.030 

Cs+ 

0.3532 

0.4065 

-62.5 

-62.13 ±0.35 

0.311 

0.309 ± 0.021 

F- 

0.4176 

0.003364 

-121.6 

-120.91 ±0.45 

0.263 

0.257 ±0.016 

Cl- 

0.4617 

0.0356 

-91.5 

-90.99 ± 0.36 

0.313 

0.309 ± 0.028 

Br- 

0.4825 

0.0587 

-84.8 

-84.48 ± 0.36 

0.329 

0.333 ±0.015 

I- 

0.5396 

0.0537 

-75.9 

-75.89 ± 0.34 

0.351 

0.347 ± 0.017 


to 










The solvation free energies from our work and Joung and Cheatham’s work are within statistieal 
error. The free energies from this study appearing in Table are within two standard deviations 
of Joung and Cheatham^^ for all ions exeept Li+, whieh is outside the parameter range studied. 
The eomparable aeeuraey of our results to those of Joung and Cheatham provide validation for the 
free energies we report. Our method has the added benefit of eomputing the solvation free energy 
for arbitrary parameter eombinations. The ability to eompute properties for arbitrary parameter 
eombinations eomes from sampling only 203 states plus 11 for the absolute solvation free energies. 
Joung and Cheatham earned out 12-13 simulations for eaeh of the 9 ions, resulting in 108-117 
simulations for eomparison. We were able to eompute properties at roughly 14,000 times the 
number of parameter eombinations, for only double the simulation eost. 

Our method breaks down if the parameter eombination falls outside the defined range. The 
parameters for the Li+ ion in Table [T] fall signifieantly outside the range we searehed, namely, the 
On is less than the 0.25 nm minimum. The estimated free energy for this parameter eombination 
is not within statistieal error for that of Joung and Cheatham.l^ The estimated value for any ther- 
modynamie property at this ion will likely inaeeurate as the estimation is now an extrapolation 
instead of a thermodynamieally-eonsistent interpolation between sampled states. Estimates on pa¬ 
rameter eombinations falling just outside the searehed range, sueh as the Na+ ion, appear to still 
be aeeurate, so the range of eonvergenee of these ealeulations outside of samples parameters is not 
zero. 

4.4.3 Estimating Radial Distribution Functions 

We ean estimate the radial distribution funetion (RDF or g(r)) of a speeified parameter eombina¬ 
tion without explieitly sampling that eombination. The first hydration shell and the water RDF are 
properties that many have tried to eompute aeeurately and eompare to experiment . I64|67t l74| 
tionally, a RDF is generated by measuring the distanees between two speeified atomie groups (e.g. 
ion-water, water oxygen-water oxygen, ete.) generated over a trajeetory, eounting the number of 
pairs that are within a shell of size r + dr, then average over the shell volume and whole trajee- 
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tory. The fact that the RDF is an average property and dependent only on the configuration implies 
it is a thermodynamic equilibrium property which can be computed as a statistical observable. 
Computing an equilibrium expectation value of some observable, A, with MBArI^ is 

N 

{^)a = L ^naA{x„) (10) 

n=l 


where the summation runs over all samples in all states, W„a are the statistical weights from 
reweighting each nth drawn sample in the ath state, the A(x„) are the observed values from sample 
n and are functions of the configuration, x. The weight MBAR assigns to each sample n is® 


Wna = 


of a 




( 11 ) 


where fa is the reduced free energy (/3A or /3G) of state a, Ma(x„) is the reduced internal energy 
(j8t/) of the configuration x„ in state a, and Nk is the number of samples collected from state k. 
The observable for computing the RDF is the discrete count of pairs within a specified r-\-dr shell 
normalized by the shell radius and the number volume p = Aparticies/1^- We must estimate the RDF 
at multiple shell volumes in order to generate a complete RDF curve. 

The first hydration shells (FHS) are accurately predicted by the RDF estimation. Fig. shows 
the RDF estimated for the Li+ and the Cl ions from Joung and Cheatham.® The RDF is estimated 
at 160 discrete bins (0.0075 nm spacing) from r = 0 to r = 1.2 nm along with the error in that 
estimate. The black curves in Fig. [^are the estimates from MBAR computed purely from data 
collected at our 203 states, and not from any data drawn at the ion’s parameters. Error in the 
MBAR estimate is shown as dashed lines and is two standard deviations of the uncertainty in the 
RDF, also computed by MBAR. To validate the MBAR results, the green curves in Fig. show 
the RDF computed from simulations at the given ion’s parameters. The data from these direct 
simulations of the ions were not used in the MBAR estimate. The error in the green curves is taken 

to 


from 200 bootstrap samples'^ of the RDF for each ion. The Li+ ion RDF is shown in Fig. 8b 


again emphasize that estimates made outside the parameter range tend to break down, evidenced 


31 




by the erratic behavior in the RDF. 



(a) Gl¬ 


ib) Li+ 


Figure 8: Radial distribution functions (RDF) can be estimated at any parameter combination 
inside the parameter search range. The ion-oxygen RDF in TIP3P water is shown for Cl 
and Li+ with Joung and Cheatham parameters.!^ The RDFs are estimated without sampling the 
explicit parameter combinations using 160 discrete bins. Estimates made from parameters inside 
the parameter range, CD (a) are accurate as the first hydration shell is predicted within error 
to Joung and Cheatham.!^ in Table Estimating parameters which fall outside the searched 


parameter range, Li+ (b) are inaccurate. Error is shown dashed lines of two standard deviations. 


computed by MBAR for the black curves and 200 bootstrap samples for the green curves. 


The Cl ion in Eig. 8a shows an example where we can estimate solvation structure and im¬ 
prove future simulations. We further validated the RDE calculation by determining the peak of the 
first hydration shell for every ion from Joung and Cheatham as the bin with highest occupancy!^ 
and compared our results to theirs in Table The peaks we estimate and those from Joung and 
Cheatham are in agreement with each other, within the error of our bin width of 0.0075 nm. The 
RDE curves generated using reweighting are not as smooth as the RDE curves from other studies, 
and we do not expect them to be smooth with the range of parameters we searched. However, all 
features are well preserved. This approach can be used to broadly search parameter space and gen¬ 
erate approximate RDE’s, until those that replicates the RDE or hydration shell properties of inter¬ 
est are identified. Eurther simulations could then be run around the sets of properties which gave 
the RDE replicating the target properties to make more accurate estimates, resulting in searches 
over a much narrower parameter space than examined through here. A complete set of the RDEs 
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estimated via reweighting and direet simulation for every ion is ineluded in the supplementary 
material.^ 


4.4.4 Born Approximation to Solvation Free Energy 

The Bom approximation to solvation free energy measures the effort to transfer a eharged partiele 
between two dieleetries. The free energy differenees for this approximation of transferring a hard 
sphere partiele between vaeuum and a fluid is 


AG 


4;reo^,y \£d ) 


( 12 ) 


where = 92 is the estimated dieleetrie eonstant of our fluid, TIP3P water ,1^ and Rij is the Born 
radius. 

We ean estimate the Born radius of any partiele in our seareh spaee from our sampled states. 
Choosing the eorreet Bom radius, or effeetive hard sphere (EHS) radius is a nontrivial task. How¬ 
ever, we ean estimate the EHS radius with our RDE ealeulation. We first eompute the RDE for a 
given parameter eombination, g{r), then eompute the EHS radius by determining an ro where the 
following eonditions for the oxygen-ion g(r) are met: 


g(ro -Sr)=0 

(13) 

g{ro) = 0 

(14) 

g(ro + 5r) >0 

(15) 


to a toleranee of 10^^. ro ean be interpreted as the point on g{r) where the probability of finding a 
partiele ehanges from zero to nonzero. We set Rjj = ro for the Bom solvation ealeulations. 

We applied eorreetion terms to our estimated free energies to remove errors introdueed by our 
ehoiee of simulation settings and water model. These eorreetions allow a eomparison of free energy 
between different methods without having a methodologieal dependenee. All of the eorreetions 
we applied are detailed in Hiinenberger and Reif^^ and we go through explieit detail of whieh 
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corrections we applied and why in the supplementary materially in section S.6. 

There are a number of reasons the the Born approximation is not perfect for our particle- 
water system. These imperfections come from not simulating a truly infinite medium, the water 
model having an asymmetric charge distribution, Lennard-Jones terms affecting the free energy 
of transfer, and the fact that Rij may change on charging since we have soft particles. We are 
interested in identifying deviations from the Born approximation with our method, given that we 
fully expect deviations from these imperfections. 

The Bom approximation to the solvation free energy is the free energy of transferring a charged 
hard sphere with radius Rij, not for the free energy of solvating the uncharged particle. To remove 
this dependence on the cavitation free energy ,y we we estimate the Rij from the RDF as described 
above for each uncharged combination of Oij and £ij, then calculate the free energy difference 
to the same values of Oij and £ij but with a charge. This allows us to compare our free energy of 
charging to the Born approximation to the free energy of charging, and identify deviations between 
the model and our simulation. We do not recalculate Rij at the end state. 

Both trends and failures of the Born approximation can be easily visualized for the entire pa¬ 
rameter space. Fig. shows the difference between the Bom approximation (AGeom) and this 
work’s estimate for the charging free energy (AGtw)- Any deviation from AAGxw-Bom = 0 in¬ 
dicates nonidealities relative to Born approximation to the free energy of charging. There are 
several deviations which can be seen in the figure. The first deviation is the Born free energy 
generally predicts less favorable solvation free energy for both signs on the charged particles 
(AGtw < AGBom < 0). However, this deviation is asymmetric as the deviation in from Born theory 
of the positively charged particle at q = +2 is up to —152.1 kcal/mol, but the negatively charged 
particle only deviates up to —78.5kcal/mol at q ~ —2. The charging free energy more strongly 


depends on Ou for the positive particle as the A (AG) in Fig. 9a spans 13.7 kcal/mol from Ou ~ 0.5 
to On ~ 0.95 on average, whereas in Fig. the span 27.6 kcal/mol on average in the same range 
of On. We can also observe the opposite case where the Bom estimate is more favorable than the 


observed estimate (AGeom < AGtw < 0) in Fig. 9a where A (AG) > 0 at small On and £,,. This 
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opposite case occurs because the negatively charged particle attracts the water’s hydrogens, which 
do not have Lennard-Jones interactions, and would be able to approach much more closely than 
the Born model predicts with its hard sphere approximation. Simply estimating free energy at a 
few states in the parameter space would have been insufficient to observe these broad trends as a 
significant degree of interpolation, or worse extrapolation, would have been required. 
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Figure 9: Trends and failures in approximations can be visualized over wide parameter space. 

The deviation of the Born hydration free energy from the computed solvation free energy is shown 
for two fixed slices of qt. Explicitly shown is the Bom free energy minus this work’s free energy 
estimates: AGeom — AGtw- Free energy difference estimates for each combination of an and etj 
are relative to the same combination at qi = 0 as to approximate only the contribution of charging a 
given sphere in solvent, [(a)] and [(b)] show the deviations with the solute carrying a rLq charge. The 
Bom model generally predicts a more favorable interaction relative to the simulation. An exception 
to this trend is at very small an and £n for a negatively charged particle where it predicts a less 
favorable interaction than the simulation as the TIP3P water hydrogens can tightly pack around the 
particle. Animated movies showing the full free en^y and uncertainty across the whole parameter 
space are included in the supplementary materials .^3 Free energy is shown in units of kcal/mol. 


A full animation showing AACrw-Bom at combination in the parameter space is included in the 
supplementary material.^ 
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4.5 Monitoring numerical bias 

The numerical bias caused by the process of calculating energies from perturbations of reference 
states can be minimized. Eq. @ and in the supplementary material Eq. S.5 involve the addition 
and subtraction of many small numbers, depending on reference states X and Y. This can lead 
to rounding errors which may propagate to the simulation package and be made worse by the 
software’s precision. Choosing reference states with larger ACn,xY can help reduce accumulation 
of error. If the software natively allows access to the basis function values, then this source of 
numerical error is eliminated. However, we must quantify our numerical error here since the 
perturbation approach was chosen. 

We find that rounding errors do not propagate from the perturbed basis function representation 
to the thermodynamic properties for these calculations. The rounding errors for this system were 
checked by evaluating the energy of each sampled configuration at every sampled state as though 
we did not use the basis function approach. The energies of every configuration evaluated at every 
sampled state computed from directly from GROMACS were compared to the energies computed 
from the basis functions and any deviation was a result of numerical bias. The energies computed 
from basis function calculations and the energies directly from GROMACS reruns differed by less 
than 0.002%. This very small relative error does not assure that errors themselves are negligible, 
since large energies have large absolute errors. The largest absolute error in all 203 simulations 
was was 11.804kcal/mol, which at first appears is likely to have a significant effect in the final 
answers. However, these large absolute errors do not affect any of the property estimates. This 
is because these large rounding errors occur when the trajectory from a particle with small On is 
evaluated in a force field for a particle with large On. This often resulted in the oxygen of TIP3P 
water being within the large particle’s excluded volume, resulting in a highly repulsive interaction. 
Every configuration with rounding error in this study had a Boltzmann weight, exp(—/3t/(r, A)), 
indistinguishable from zero at machine precision, and thus these errors do not contribute to any of 
the properties of interest. The energies calculated using reference states thus give results that are 
sufficiently close to those from direct evaluation of the energies for all uses. 
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4.6 Convergence and alternate algorithm conditions 

Examining the uncertainty estimate from the reference state alone is insufficient to determine con¬ 
vergence of the calculations. The multidimensional space initially has many regions of configura¬ 
tion space overlap, which causes unconverged estimates of the properties and their uncertainties. 
Poor overlap implies that the mean and the maximum uncertainty alone are not appropriate gauges 
for convergence since the error does not consistently decrease with number of samples as observed 
in Fig. 1^ As discussed above, a network of overlapping configuration space between all states 
required for accurate estimate of properties and quantification of the uncertainties in these proper¬ 
ties, and we need a way to diagnose whether this network has been created at a given stage of the 
simulation. 

The configuration space overlap can be analyzed through a multidimensional extension of the 
Overlapping Distribution Method.This method can quantify the overlap between states by 
considering the probability of each sample occurring in every state. The unnormalized probabil¬ 
ity of a sample can be computed from its Boltzmann weight. Just as each sampled configura¬ 
tion carries a Boltzmann weight for the state in which it was drawn, the configurations can be 
reweighted to all other states to determine what the relative Boltzmann weights are in the other 
sampled states.^^^^^^®^^^MBARl^ stores each sample’s weights as a matrix W, the same matrix 
of weights discussed earlier in the context of expectations whose entries are W„a. The pairwise 
probabilities can be assembled in what we term the “overlap matrix,” constructed from the matrix 
of weights. This multidimensional overlap matrix is calculated from the weights as 

0 = W'^WN (16) 

where N is a diagonal matrix with each ith entry equal to the number of samples from the ith 
state .122E^ The individual elements of the overlap matrix, Ojj, can be read as the probability of a 
sample generated in state j being observed in state i. Since O is a Markov matrix, it can be shown 
that O will have at least one eigenvalue of 1, which is also the maximum over all eigenvalues. All 
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other eigenvalues will be real and positive.l^ However, multiple eigenvalues of 1 in O indieate 
that there are discontinuous regions of sampled configuration space and O can be rearranged to 
a block diagonal matrix that illustrates which states are and are not connected. Thermodynamic 
property measurements made between the discontinuous configuration spaces will have undefined 
uncertainty, which numerically can show up as either NaN or very large numbers that change 
dramatically with small changes in sampling.l^ 

Monitoring both the eigenvalues of the overlap matrix and maximum uncertainty can provide 
good guidance as to when converged property estimates have been reached. Monitoring exclusively 
the eigenvalues of O is insufficient to determine convergence over the entire parameter space since 
O only involves the sampled states. However, if the sampled states are well-enough dispersed 
such that the estimated uncertainties of all the unsampled states are low, and simultaneously all the 
sampled states are connected as demonstrated by having a single eigenvalue with value 1 for O, 
we can have high confidence that the uncertainty estimates are reliable. 

We therefore defined our property estimates as converged once there were no repeated eigen¬ 
values of 1 in O, and once no further clusters of uncertainties in the relative free energy above the 
target threshold are found, i.e. the clustering algorithm can not find new points adjacent grid points 
with large uncertainty. If desired, the uncertainty can be iteratively reduced by lowering the error 
threshold of the algorithm. The deviation of the five largest eigenvalues from 1 are shown for each 
iteration in the supplementary material in Table S.2. 

The proposed adaptive sampling algorithm is only an initial proposal, and sampling regions of 
poor configuration space overlap could be improved by changes to this algorithm. For example, 
the algorithm discussed here and detailed in the supplementary information^^Sl identified clusters 
of high relative uncertainty by counting the number of grid points in the cluster. This resulted in 
many states being chosen adaptive at larger Ou due to the density of grid points, despite the fact 
that most of the regions with no phase space overlap were at On < 0.35 nm. Although states were 
eventually placed at small Ou, one improvement could be to place points in regions with the largest 
integrated uncertainty, using the uncertainty as a weighting on the overall number of grid points. 


38 


This improvement would still favor the larger elusters, but to a lesser extent as new poekets of poor 
eonfiguration spaee overlap are identified and the uneertainty jumps baek up as in Fig. 

The eurrent study was used to determine high aeeuraey free energies over entire large parameter 
range, and this study’s range of nonbonded parameters will likely exeeed many praetieal appliea- 
tions. However, this parameter seareh method eould be adaptively shrunk to hone in on speeifie 
property estimates. Instead of determining the thermodynamie properties for a large set of starting 
parameters, a desired thermodynamie property eould be provided and a set of parameters whieh 
generate this property are searehed for, as is the ease for reverse property predietion. In this ease, 
the initial grid spaeing eould be larger, and a rough estimate of the property surfaee ean be ae- 
quired, spending less simulation time per iteration. Eaeh subsequent iteration would then narrow 
the seareh area and reduee the grid spaeing, seeking the target value. States from previous iter¬ 
ations outside the narrowed seareh spaee, ean still be ineluded in the analysis using, preventing 
disearded information. Alternatively, eomputational time ean be saved by exeluding these outlier 
states if analysis of O shows that these states are not aetually eonneeted to the states ultimately of 
interest. 

Thermodynamie property estimates are not limited to relative solvation free energies, entropies, 
and enthalpies. Onee the simulations are eonverged, further thermodynamie properties ean be 
derived from derivatives and fluetuations with respeet to V, P, and other 

property eomputed from statistieal expeetation values. 

5 Conclusion 

We have shown how one ean rapidly estimate thermodynamie properties in a multidimensional 
nonbonded parameter spaee by eombining two time saving advantages. Computing the energies 
required for estimating thermodynamie properties ean be aeeelerated with linear eombinations of 
basis funetions instead of re-running simulation foree loops. Estimating the thermodynamie prop¬ 
erties in the multidimensional parameter spaee is possible with the binless, multidimensional, path- 
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free statistieal method, MBAR. With these methods, properties are estimated at all states of interest 
simultaneously without the need define how any state is eonneeted to any others beforehand. 

Converged results ean be aequired by adaptively sampling the multidimensional parameter 
spaee, ereating a network of globally-eonneeted eonfiguration spaee overlap between all states. 
Simply adding samples in regions of large uneertainty does not neeessarily ereate eonfiguration 
spaee overlap to all states, as the uneertainty of differenees to other states is not redueed. Regions 
of poor eonfiguration spaee overlap ean be identified by examining the overlap matrix between all 
sampled states, and the maximum uneertainty in the unsampled states. The parameter spaee is then 
adaptively sampled until eonfiguration spaee overlap is ereated between the referenee state and all 
other states of interest, and uneertainties are pushed suffieiently low for the purpose at hand. 

The methods shown here ean help speed up future thermodynamie property searehes in multi¬ 
dimensional parameter spaee. So long as the energy funetions ean be eomputed with veetor oper¬ 
ations, and do not require re-running the simulation foree loops, these methods ean seale to even 
higher dimensionality and extend to other thermodynamie properties. Re-writing the simulation 
eode to direetly provide the required basis funetions would allow even faster energy evaluation, 
potentially removing the need to ever eompute the energy of a eonfiguration more than onee. 

6 Acknowledgments 

The authors would like to thank Kyle Beauehamp and John Chodera at Memorial Sloan-Kettering 
Caneer Center for feedbaek and assistanee in running optimized MBAR eode. The authors also 
aeknowledge the support of NSF grant CHE-1152786. 

References 

(1) Wang, C. C.; Pilania, G.; Boggs, S. A.; Kumar, S.; Breneman, C.; Ramprasad, R. Polymer 
2014, 55, 979-988. 

(2) Harini, M.; Adhikari, J.; Rani, K. Y. Ind. Eng. Chem. Res. 2013, 52, 6869-6893. 


40 



(3) Baron, R., Ed. Computational Drug Discovery and Design-, Humana Press: New York, 2012; 

p628. 

(4) Li, H.; Zheng, M.; Luo, X.; Zhu, W.; Jiang, H. Chem. Biol', John Wiley & Sons, Ine., 2012; 
pp 23-40. 

(5) Hansen, N.; van Gunsteren, W. L. J. Chem. Theory Comput. 2014, 2632-2647. 

(6) Yang, Q.; Liu, D.; Zhong, C.; Li, J.-r. Chem. Rev. 2013, 113, 8261-8323. 

(7) Montieelli, L.; Tieleman, D. In Biomol. Simulations SE - 5; Montieelli, L., Salonen, E., Eds.; 
Methods in Moleeular Biology; Humana Press, 2013; Vol. 924; pp 197-213. 

(8) Maekerell, A. D.; Bashford, D.; Bellott, M.; Dunbraek, R. L.; Evanseek, J. D.; Lield, M. J.; 
Liseher, S.; Gao, J.; Guo, H.; Ha, S.; Kuehnir, L.; Kuezera, K.; Lau, L. T. K.; Mattos, C.; 
Miehniek, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Sehlenkrieh, M.; 
Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wio, J.; Yin, D.; Karplus, M. J. Phys. Chem. 
B 1998, 5647, 3586-3616. 

(9) Vanommeslaeghe, K.; Hateher, E.; Aeharya, C.; Kundu, S.; Zhong, S.; Shim, J.; Darian, E.; 
Guveneh, O.; Lopes, P; Vorobyov, L; Jr., A. D. M. J. Comput. Chem. 2011, 31, 671-690. 

(10) Oostenbrink, C.; Villa, A.; Mark, A. E.; van Gunsteren, W. E. J. Comput. Chem. 2004, 25, 
1656-76. 

(11) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. J. Am. Chem. Soc. 1996, 7863, 11225- 
11236. 

(12) Kaminski, G. A.; Eriesner, R. A.; Tirado-rives, J.; Jorgensen, W. L. J. Phys. Chem. B 2001, 
2,6474-6487. 

(13) Pearlman, D. A.; Case, D. A.; Caldwell, J. W.; Ross, W. S.; Cheatham, T. E.; DeBolt, S.; 
Eerguson, D.; Seibel, G.; Kollman, P. Comput. Phys. Commun. 1995, 91, 1-41. 


41 



(14) Wang, J.; Wolf, R. M.; Caldwell, J. W; Kollman, P. A.; Case, D. A. J. Comput. Chem. 2004, 
25, 1157-74. 

(15) van Gunsteren, W. E; Mark, A. E. J. Chem. Phys. 1998, 108, 6109. 

(16) van Gunsteren, W. E; Bakowies, D.; Baron, R.; Chandrasekhar, L; Christen, M.; Daura, X.; 
Gee, R; Geerke, D. R; Glattli, A.; Hunenberger, P. H.; Kastenholz, M. A.; Oostenbrink, C.; 
Schenk, M.; Trzesniak, D.; van der Vegt, N. E A.; Yu, H. B. Angew. Chem. Int. Ed. Engl. 
2006, 45, 4064-92. 

(17) Abrams, D. S.; Prausnitz, J. M.AIChEJ. 1975, 21, 116-128. 

(18) Fredenslund, A.; Jones, R. L.; Prausnitz, J. W.AlChE J. 1975, 21, 1086-1099. 

(19) Frenkel, D.; Smit, B. Understanding Molecular Simulation, 2nd ed.; Academic Press: San 
Diego, 2002; p 638. 

(20) Pitera, J. W.; van Gunsteren, W. F. Mol. Simul. 2002, 28, 45-65. 

(21) Blondel, A. J. Comput. Chem. 2004, 25, 985-993. 

(22) Crooks, G. E. Phys. Rev. Lett. 2007, 99, 10-13. 

(23) Steinbrecher, T.; Mobley, D. L.; Case, D. A. J. Chem. Phys. 2007, 127, 214108. 

(24) Hritz, J.; Oostenbrink, C. J. Chem. Phys. 2008, 128, 144121. 

(25) Riniker, S.; Christ, C. D.; Hansen, H. S.; Hunenberger, P. H.; Oostenbrink, C.; Steiner, D.; 
van Gunsteren, W. F. J. Phys. Chem. B 2011, 115, 13570-7. 

(26) Buelens, F. P; Grubmiiller, H. J. Comput. Chem. 2011, 25-33. 

(27) Pham, T. T.; Shirts, M. R. J. Chem. Phys. 2011, 135, 034114. 

(28) Pham, T. T.; Shirts, M. R. J. Chem. Phys. 2012, 136, 124120. 


42 



(29) Naden, L. N.; Pham, T. T.; Shirts, M. R. J. Chem. Theory Comput. 2014, 10, 1128-1149. 

(30) Naden, L. N.; Shirts, M. R. J. Chem. Theory Comput. 2015, 11, 2536-2549. 

(31) Shirts, M. R.; Pitera, J. W.; Swope, W. C.; Pande, V. S. J. Chem. Phys. 2003, 119, 5740. 

(32) Steinbrecher, T.; Joung, L; Case, D. A. J. Comput. Chem. 2011, 32, 3253-63. 

(33) Mobley, D. L.; Bayly, C. L; Cooper, M. D.; Shirts, M. R.; Dill, K. A. J. Chem. Theory 
Comput. 2009, 5, 350-358. 

(34) Weinhold, F. J. Chem. Phys. 1975, 63, 2479. 

(35) Bennett, C. H. J. Comput. Phys. 1976, 268, 245-268. 

(36) J. Phys. Chem. B 2003, 107, 9535-9551. 

(37) Wu, D.; Kofke, D. A. J. Chem. Phys. 2005, 123, 084109. 

(38) Paliwal, H.; Shirts, M. R. J. Chem. Theory Comput. 2013, 9, 4700-4717. 

(39) Shirts, M. R.; Chodera, J. D. J. Chem. Phys. 2008, 129, 124105. 

(40) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. J. Comput. 
Chem. 1992, 13, 1011-1021. 

(41) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. J. Comput. 
Chem. 1995, 16, 1339-1350. 

(42) Zwanzig, R. W. J. Chem. Phys. 1954, 22, 1420. 

(43) Shirts, M. R.; Pande, V. S. J. Chem. Phys. 2005, 122, 144107. 

(44) Paliwal, H.; Shirts, M. R. J. Chem. Theory Comput. 2011, 4115-4134. 

(45) de Ruiter, A.; Oostenbrink, C. J. Chem. Theory Comput. 2012, 8, 3686-3695. 


43 



(46) Beutler, T.; Mark, A.; van Schaik, R. Chem. Phys. Lett. 1994, 222, 529-539. 

(47) Zacharias, M.; Straatsma, T. R; McCammon, J. A. J. Chem. Phys. 1994, 100, 9025. 

(48) Pronk, S.; Pall, S.; Schulz, R.; Larsson, P; Bjelkmar, P; Apostolov, R.; Shirts, M. R.; 
Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. Bioinformatics 2013, 29, 
845-54. 

(49) GROMACS. http : / /www. gromacs . org/ (accessed Dec 02, 2013). 

(50) Parrinello, M.; Rahman, A. J. Appl. Phys. 1981, 52, 7182. 

(51) Nose, S.; Klein, M. L. Mol. Phys. 1983, 50, 1055-1076. 

(52) Huang, D.; Geissler, P; Chandler, D. J. Phys. Chem. B 2001, 105, 6704-6709. 

(53) Grigoriev, F. V.; Basilevsky, M. V.; Cabin, S. N.; Romanov, A. N.; Sulimov, V. B. J. Phys. 
Chem. B 2007, 111, 13748-55. 

(54) Case, D. A.; Darden, T. A.; Cheatham, T. E.; Simmerling, C. E.; Wang, J.; Duke, R. E.; 
Euo, R.; Walker, R. C.; Zhang, W.; Merz, K. M.; Roberts, B.; Hayik, S.; Roitberg, A.; 
Seabra, G.; Swails, J.; Goetz, A. W.; Kolossvary, L; Wong, K.; Paesani, E; Vanicek, J.; 
Wolf, R. M.; Eiu, J.; Wu, X.; Brozell, S.; Steinbrecher, T.; Gohlke, H.; Cai, Q.; Ye, X.; 
Hsieh, M.-J.; Cui, G.; Roe, D. R.; Mathews, D. H.; Seetin, M. G.; Salomon-Ferrer, R.; 
Sagui, C.; Babin, V.; Euchko, T.; Gusarov, S.; Kovalenko, A.; Kollman, P. A. AMBER 12', 
University of California: San Francisco, 2012. 

(55) For details, please see the supplementary information for this article. 

(56) Analysis code for this project can be found in the ion-parameter repository on GitHub 

at https://github.com/shirtsgroup/ion-parameters with commit hash 
7ffl88b0bc or later. 

(57) Kuharski, R. A.; Rossky, P. J. J. Am. Chem. Soc. 1984, 106, 5794-5800. 


44 




(58) SciPy: Open Source Scientific Tools for Python, http: //www. scipy.org/ (accessed 
May 3, 2014). 

(59) Ester, M.; Kriegel, H.-p.; S, J.; Xu, X. A density-based algorithm for discovering clusters in 
large spatial databases with noise. 1996; pp 226-231. 

(60) Sobel, L; Feldman, G. "A 3x3 Isotropic Gradient Operator for Image Processing. Stanford 
Artif. Intell. Proj. 1968. 

(61) Kruskal, J. B. Proc. Am. Math. Soc. 1956, 7, 48-50. 

(62) Fenley, A. T.; Muddana, H. S.; Gilson, M. K. Proc. Natl. Acad. Set U. S. A. 2012, 109, 
20006-20011. 

(63) Fenley, A. T.; Killian, B. J.; Hnizdo, V.; Fedorowicz, A.; Sharp, D. S.; Gilson, M. K. J. Phys. 
Chem. B 2015, 118, 6447-6455. 

(64) Joung, I. S.; Cheatham, T. E. J. Phys. Chem. B 2008, 112, 9020-41. 

(65) Grossfield, A.; Ren, P; Ponder, J. W. J. Am. Chem. Soc. 2003, 125, 15671-15682. 

(66) Efron, B.; Tibshirani, R. An Introduction to the Bootstrap', Chapman & Hall/CRC, 1993; p 
436. 

(67) Jensen, K. P; Jorgensen, W. F. J. Chem. Theory Comput. 2006, 2, 1499-1509. 

(68) Aqvist, J. J. Phys. Chem. 1990, 94, 8021-8024. 

(69) Beglov, D.; Roux, B.; He, C. J. Med. Phys. 1994, 100, 9050-9063. 

(70) Smith, D. E.; Dang, F. X. Chem. Phys. Lett. 1994, 230, 209-214. 

(71) Dang, F. X. J. Chem. Phys. 1992, 96, 6970. 

(72) Dang, F. X. Chem. Phys. Lett. 1994, 2-5. 


45 



(73) Dang, L. X. J. Am. Chem. Soc. 1995, 117, 6954-6960. 

(74) Dang, L. X.; Garrett, B. C. 7. Chem. Phys. 1993, 99, 2972. 

(75) Lamoureux, G.; a.D. MacKerell,; Roux, B. J. Chem. Phys. 2003, 119, 5185-5197. 

(76) Hiinenberger, R; Reif, M. Single-Ion Solvation: Experimental and Theoretical Approaches 
to Elusive Thermodynamic Quantities', RSC theoretical and computational chemistry series; 
Royal Society of Chemistry, 2011. 

(77) Pohorille, A.; Jarzynski, C.; Chipot, C. J. Phys. Chem. B 2010, 114, 10235-53. 

(78) Tan, Z. J. Am. Stat. Assoc. 2004, 99, 1027-1036. 

(79) Klimovich, R; Shirts, M.; Mobley, D. Journal of Computer-Aided Molecular Design 2015, 
29, 397-411. 

(80) Aimoli, C. G.; Maginn, E. J.; Abreu, C. R. Eluid Phase Equilib. 2014, 368, 80-90. 

(81) Avendano, C.; Lafitte, T.; Galindo, A.; Adjiman, C. S.; Jackson, G.; Muller, E. A. J. Phys. 
Chem. B 2011,115, 11154-11169. 

(82) Allen, E. H.; Kennard, O.; Watson, D. G.; Brammer, E.; Orpen, A. G. J. Chem. Soc. Perkin 
Trans. 11 1987, 1695-1914. 


46 



